Feature Selection by Distributions Contrasting

نویسندگان

  • Varvara V. Tsurko
  • Anatoly I. Michalski
چکیده

We consider the problem of selection the set of features that are the most significant for partitioning two given data sets. The criterion for selection which is to be maximized is the symmetric information distance between distributions of the features subset in the two classes. These distributions are estimated using Bayesian approach for uniform priors, the symmetric information distance is given by the lower estimate using Rademacher penalty and inequalities from the empirical processes theory. The approach was applied to a real example for selection a set of manufacture process parameters to predict one of two states of the process. It was found that only 2 parameters from 10 were enough to recognize the true state of the process with error level 8%. The set of parameters was found on the base of 550 independent observations in training sample. Performance of the approach was evaluated using 270 independent observations in test sample.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving Chernoff criterion for classification by using the filled function

Linear discriminant analysis is a well-known matrix-based dimensionality reduction method. It is a supervised feature extraction method used in two-class classification problems. However, it is incapable of dealing with data in which classes have unequal covariance matrices. Taking this issue, the Chernoff distance is an appropriate criterion to measure distances between distributions. In the p...

متن کامل

Feature selection using genetic algorithm for breast cancer diagnosis: experiment on three different datasets

Objective(s): This study addresses feature selection for breast cancer diagnosis. The present process uses a wrapper approach using GA-based on feature selection and PS-classifier. The results of experiment show that the proposed model is comparable to the other models on Wisconsin breast cancer datasets. Materials and Methods: To evaluate effectiveness of proposed feature selection method, we ...

متن کامل

Enhancing feature selection with feature maximization metric

This paper deals with a new feature selection and feature contrasting approach for classification of highly unbalanced textual data with a high degree of similarity between associated classes. The efficiency of the approach is illustrated by its capacity to enhance the classification of bibliographic references into a patent classification scheme. A complementary experiment is performed on a no...

متن کامل

Feature Selection Using Multi Objective Genetic Algorithm with Support Vector Machine

Different approaches have been proposed for feature selection to obtain suitable features subset among all features. These methods search feature space for feature subsets which satisfies some criteria or optimizes several objective functions. The objective functions are divided into two main groups: filter and wrapper methods.  In filter methods, features subsets are selected due to some measu...

متن کامل

On the Feature Selection Criterion Proposed in ‘Gait Feature Subset Selection by Mutual Information’

Abstract Recently, Guo and Nixon [1] proposed a feature selection method based on maximizing I(x; Y ), the multidimensional mutual information between feature vector x and class variable Y . Because computing I(x; Y ) can be difficult in practice, Guo and Nixon proposed an approximation of I(x; Y ) as the criterion for feature selection. We show that Guo and Nixon’s criterion originates from ap...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014